10 node AFF Netapp cluster nodes highly utilized and unable to set maintenance window Hi friends, Need your valuable suggestion as always. I have a 10 node cluster which is highly utilized all times. Among those 2 nodes are hitting 80% on regular basis. As this a critical cluster I am unable to set a maintenance window for ONTAP upgrade. Vol move activity are not possible at the moment as need to upgrade cluster by next week. Any valuable suggestions please let me how to proceed with maintenance window. Is there any critical parameter like IOPs, latency which I can look into for performance and decide to set maintenance window. It should be non disruptive upgrade and Host team should not have any downtime during the activity. ONTAP Version upgrade planned from 9.11.1p8 to 9.11.1p16 to 9.15.1p16,it is a multi hop upgrade.
... View more
At CES 2026 in Las Vegas, NVIDIA unveiled several groundbreaking technologies to accelerate AI adoption—not just improving performance but also providing greater energy efficiency. One standout announcement was the Inference Context Memory Storage (ICMS) platform, introduced during Jensen Huang’s keynote. What is ICMS and Why Does it Matter? ICMS targets gigascale AI inference environments and large AI factories, where models increasingly rely on iterative, multi-step reasoning that reuses context repeatedly. Traditionally, this context—stored in the KV cache (Key-Value Cache)—resided in each GPU’s high-bandwidth memory (HBM). In long-context, multi-turn inference, GPUs often regenerate the same context multiple times, wasting compute cycles and energy. By enabling GPUs to retain and share context memory across inference processes, ICMS dramatically boosts tokens-per-second throughput and improves energy efficiency. For a technical deep dive, see NVIDIA’s blog: Introducing NVIDIA BlueField-4 Powered Inference Context Memory Storage. Where does ICMS fit in the Memory and Storage Hierarchy? ICMS adds a new tier to the AI memory/storage hierarchy, designed specifically for transient inference context data. Current tiers include: GPU memory (HBM) – nanosecond access System RAM Local flash storage (SSD) Network storage – large-scale, shared datasets, protected, highly available, with enterprise grade data management. There are hierarchies within network storage layer as well (hot, warm, cold) to meet the performance requirements of applications at the lowest price and highest power efficiency. As context cache grows beyond GPU memory capacity, it spills into RAM and SSDs. ICMS introduces a new tier 3.5 layer to scale mixture-of-experts (MoE) models involved in multi-step inferencing operations. However, KV cache is ephemeral and may be recreated, though recreation has its own implied additional latencies for users. For a scalable inferencing system relying on hundreds of agents, each creating their own contexts, it would be important to save the query history, answer history, and context for any paused conversations on enterprise storage systems. Furthermore, AI still depends on durable, secure, high-performance enterprise storage for reference datasets, model training data, and other mission-critical assets. Protecting that data remains essential, and NetApp delivers industry-leading secure storage to meet those needs. NVIDIA ICMS, NVIDIA BlueField-4, and the NetApp AI Data Engine The NetApp AI Data Engine (AIDE) is an end-to-end, storage-integrated AI data service that enables you to simplify and secure your AI data pipelines. AIDE helps you find relevant data across your enterprise, ensures that the data in your data pipeline is current and secure it with guardrails, and transform your data for usage with apps or agents. ICMS is powered by the NVIDIA Bluefield-4 processor. This combination enables direct and efficient access to shared, external storage by GPU clusters. This architecture makes the KV entries available to GPUs quickly from networked storage so inferencing processes don’t need to recalculate the context. ICMS and BlueField-4 expand KV context reuse by storing them in shared storage accessible to all GPU nodes in the cluster. The location index of KV entries is stored in the NVIDIA Bluefield-4 memory and synchronized across GPU nodes by NetApp storage orchestration. NetApp storage orchestration on the GPU node can help balance the load across all the external storage elements, while GPUDirect RDMA transfer from storage to GPU minimizes latency for accessing KV cache entries. AIDE being part of this architecture further helps with low latency inferencing. The knowledge graph that can be exported by AIDE can act as an aggregator and work with Retrieval Augmented Generation (RAG) pipelines and other data sources to provide the required context for the prompt. Additionally, since the prompt and context pass through the aggregator even before reaching the LLM it helps predict the tokens and passes hints to the storage engine to prefetch the proper KV entries. NVIDIA Bluefield-4, ICMS, and NetApp AIDE align to streamline AI data pipelines, accelerate TTFT (Time to First Token), and reduce $/token. Looking Ahead Per NVIDIA’s announcement, ICMS is expected to ship later in 2026. Over the next few weeks we will continue to share even more details on how NetApp and NVIDIA technologies can be combined to help customers maximize performance, efficiency, and value from their AI investments addressing both training and inferencing AI work streams. NetApp and NVIDIA have an extensive and long-lasting partnership and we are working closely to deliver the best co-designed solutions to help customers on their business transformation journeys leveraging the power of AI. Stay tuned for more on how NetApp will support the new BlueField-4 based architectures for AI workloads.
... View more
Hi Team, I am inquiring about the configuration maximums for NetApp ONTAP storage, specifically in the context of Kubernetes and OpenShift environments. My focus is on the following limits: The maximum number of NFS shares that can be created on an ONTAP system. The maximum number of Block Volumes (e.g., iSCSI/FC LUNs) that can be created on an ONTAP system. The maximum supported size of a single Block Volume on an ONTAP system. The background for these questions is to understand the physical storage constraints that apply to Kubernetes/OpenShift users. When users request storage space via a Persistent Volume Claim (PVC), the request (including the desired Persistent Volume (PV) size) is passed from the container platform layer to ONTAP through Trident, NetApp's CSI Driver. While Kubernetes/OpenShift does not impose inherent limits on the count or size of PVCs/PVs, the underlying hardware storage system's limitations will ultimately govern these maximums. Therefore, I need to know the specific configuration maximums of ONTAP to inform our planning and deployment. Thank you for your assistance.
... View more
Hello, I want to run ndmpcopy from one volume to another on the same netapp, but I want to make sure that files that may already exist on the destination netapp are not duplicated and overfill the volume to 100% Earlier I ran: node run -node node1 ndmpcopy -sa ndmpuser:yMGg5d0LyUG8l1kn -da ndmpuser:yMGg5d0LyUG8l1kn 10.63.107.200:/svm2/vol1 10.63.107.200:/svm2/vol2 But this resulted in the destination netapp being overfilled to 100%. Is there a way to make sure this doesn't happen? Both source volume had 500GB of storage and it was 66% full, the destination volume is also 500GB and was at 25% full.
... View more
I am wondering what the community at large is using for gathering and/or data-mine their NetApp product logs? I'll start, I fall into the: I'm not yet, but it topping my on-the-side to-do list and I've have started looking.
... View more